Skip to content

Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176

Closed
caaavik-msft wants to merge 8 commits intodotnet:mainfrom
caaavik-msft:caaavik/docker-copilot-server
Closed

Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176
caaavik-msft wants to merge 8 commits intodotnet:mainfrom
caaavik-msft:caaavik/docker-copilot-server

Conversation

@caaavik-msft
Copy link
Contributor

Summary

This PR adds an optional Docker execution mode to skill-validator so agent runs, judges, and setup commands can execute in an isolated container instead of directly on the host machine.

Motivation

The main use case for this is for local development, but it might also be useful for running in CI if we want to build on top of it. I was building some skills and found that when using some weaker models, they made destructive changes to my host system to accomplish the task (e.g. reinstalling .NET). With this, agents and judges run inside a container with only access to the files they need bound to the host machine. This does not add any additional security measures for network isolation.

Implementation

This makes use of the --headless mode for running copilot as described here: https://github.com/github/copilot-sdk/blob/main/docs/guides/setup/backend-services.md.

It requires a GITHUB_TOKEN be present to pass into the container so that it can use that to authenticate to the Copilot API. I have an example in the README which explains that you can get this token with gh auth token. For people with multiple gh accounts (e.g. personal and enterprise), you can also do gh auth token --user <name>.

A Dockerfile is included in the repo to use as the base image:

FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build

ARG COPILOT_SDK_VERSION
RUN dotnet new console -o /tmp/dl \
    && dotnet add /tmp/dl package GitHub.Copilot.SDK --version $COPILOT_SDK_VERSION \
    && dotnet build /tmp/dl -c Release \
    && cp /tmp/dl/bin/Release/net10.0/runtimes/*/native/copilot /usr/local/bin/copilot \
    && chmod +x /usr/local/bin/copilot \
    && rm -rf /tmp/dl

RUN copilot --version

This ensures that we use the exact same Copilot CLI binary that is shipped with the SDK. The SDK version is resolved programmatically inside the SkillValidator so it is kept in sync. It places the copilot binary at /usr/local/bin/copilot inside the container.

To handle path mapping/translation, when running in docker mode, all temp/work directories are placed inside a single directory in the TMP folder, and that entire directory is mounted into the container with read-write. This makes it easy to map paths to and from the host and container equivalent when needed. Skill directories are also mounted into the container with read-only access, and only the directories that are being evaluated will be mounted.

The container uses a randomised port -p 0:4321 which is resolved later using docker port. The container is always cleaned up after finishing, including on ProcessExit and CancelKeyPress events.

Future Extensibility

I have a proof of concept working locally which I chose not to push for now to keep this PR simple which runs all agents inside their own containers rather than having a single container that is used to run all agents and judges. This would help reduce any risks of agents modifying the environment and impacting other evaluations if that sounds desirable, but it does mean that each agent would use a separate CopilotClient rather a single shared CopilotClient.

@caaavik-msft caaavik-msft marked this pull request as ready for review March 6, 2026 05:01
@caaavik-msft caaavik-msft requested a review from a team March 6, 2026 05:01
Copilot AI review requested due to automatic review settings March 6, 2026 05:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional --run-in-docker execution mode to eng/skill-validator so Copilot CLI runs headlessly inside a Docker container, with host/container path translation and read-only skill mounts to reduce host impact during evaluations.

Changes:

  • Introduce DockerCopilotServer to build/run a Copilot CLI container, map work/skill paths, and exec setup commands in-container.
  • Wire Docker-mode path mapping into agent sessions and judges; add --run-in-docker config/flag and documentation.
  • Add Dockerfile to the build output and new unit tests for mount/path mapping logic.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
eng/skill-validator/src/Services/DockerCopilotServer.cs Implements Docker image build/run, mounts, path translation, and container lifecycle management.
eng/skill-validator/src/Services/AgentRunner.cs Uses Docker CLI URL when enabled; maps paths; runs setup commands via docker exec; maps permission paths back to host.
eng/skill-validator/src/Services/Judge.cs Maps working directory to container path for judge sessions in Docker mode.
eng/skill-validator/src/Services/PairwiseJudge.cs Maps working directory to container path for pairwise judge sessions in Docker mode.
eng/skill-validator/src/Commands/ValidateCommand.cs Adds --run-in-docker, initializes Docker server before runs, and stops it during normal cleanup.
eng/skill-validator/src/Models/Models.cs Adds RunInDocker to ValidatorConfig.
eng/skill-validator/src/SkillValidator.csproj Copies Dockerfile into output so it’s available at runtime.
eng/skill-validator/src/Docker/Dockerfile Builds a base image and installs the Copilot CLI binary from the SDK package.
eng/skill-validator/README.md Documents Docker mode requirements and usage.
eng/skill-validator/tests/DockerCopilotServerTests.cs Adds unit tests for mount computation and host/container path mapping.
Comments suppressed due to low confidence (1)

eng/skill-validator/src/Commands/ValidateCommand.cs:169

  • In Docker mode, DockerCopilotServer.Initialize(...) happens before early-return paths in model validation (invalid model / exception). Since GetSharedClient() can start the Docker container during model validation, returning early here can leak a running container and temp dirs. Wrap the body of Run in a try/finally that always calls StopSharedClient(), dockerServer.StopAsync(), and CleanupWorkDirs() (and/or clear DockerCopilotServer.Instance) even on early exits.
        // Set up DockerCopilotServer with skill directories to mount
        if (config.RunInDocker)
            DockerCopilotServer.Initialize(config.Verbose, allSkills);

        // Validate model early
        try
        {
            var client = await AgentRunner.GetSharedClient(config.Verbose);
            var models = await client.ListModelsAsync();
            var modelIds = models.Select(m => m.Id).ToList();
            var modelsToValidate = new List<string> { config.Model };
            if (config.JudgeModel != config.Model) modelsToValidate.Add(config.JudgeModel);

            foreach (var m in modelsToValidate)
            {
                if (!modelIds.Contains(m))
                {
                    Console.Error.WriteLine($"Invalid model: \"{m}\"\nAvailable models: {string.Join(", ", modelIds)}");
                    return 1;
                }
            }

            Console.WriteLine($"Using model: {config.Model}" +
                (config.JudgeModel != config.Model ? $", judge: {config.JudgeModel}" : "") +
                $", judge-mode: {config.JudgeMode}");
        }
        catch (Exception error)
        {
            Console.Error.WriteLine($"Failed to validate model: {error}");
            return 1;
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +179 to +184
if (!Path.IsPathRooted(relativePath) &&
!relativePath.StartsWith(".." + Path.DirectorySeparatorChar, StringComparison.Ordinal) &&
relativePath != "..")
{
return Path.Combine(containerMount, relativePath).Replace("\\", "/");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This repeats and should be extracted to a helper function.

Also the separators normalization part might be needed regardles of path being rooted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change, although the non-rooted path returns an exception. If the relative path is rooted, it means that they are on different drives. This code just means that only paths that are mapped inside the container will return without exception.

"run",
"--name", containerName,
"-p", $"0:{InternalPort}", // Map internal port to a random host port
"-e", "GITHUB_TOKEN",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like anything within the container (e.g. misbehaving skill) can now use/flush the token.
cc: @ViktorHofer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked to see how the Copilot SDK uses GitHubToken and it looks like it just passes it in via an environment variable anyways https://github.com/github/copilot-sdk/blob/4e1499dd23709022c720eaaa5457d00bf0cb3977/dotnet/src/Client.cs#L1003-L1006. So I think it's not going to be practical to hide that. I think Copilot CLI would be the only practical place to fix this by having it hide the env var from any child subprocesses it creates. Also with full shell access, a misbehaving skill could always just run gh auth token to get a new token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a sample here actually showing a way to solve this properly, but it would make things too complicated: https://github.com/github/copilot-sdk/tree/main/test/scenarios/bundling/container-proxy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Copilot CLI would be the only practical place to fix this by having it hide the env var from any child subprocesses it creates.

Damn, we should definitely fix this in the Copilot SDK. Would you mind filing an issue?

Overall, as long as we don't use this in CI where we have less control over the PAT, this is fine.

Copy link
Member

@JanKrivanek JanKrivanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me if it's only for local usage, not to be used in PR review pipeline (so opt-in is good here - as contributors cannot change it).

Some of this might be alleviated with the support for msbench that we are working on

@ViktorHofer
Copy link
Member

I assume you now have maintainer permissions. Can you please submit this change again from a non forked branch? Annoying, I know but we don't honor skill-validator changes in forks.

@caaavik-msft
Copy link
Contributor Author

Reopened in #273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants